A Proof Theory for Loop-Parallelizing Transformations
نویسنده
چکیده
The microprocessor industry has embraced multicore architectures as the new dominant design paradigm. Harnessing the full power of such computers requires writing multithreaded programs, but regardless of whether one is writing a program from scratch or porting an existing single-threaded program, concurrency is hard to implement correctly and often reduces the legibility and maintainability of the source code. Single-threaded programs are easier to write, understand, and verify. Parallelizing compilers offer one solution by automatically transforming sequential programs into parallel programs. Assisting the programmer with challenging tasks like this (or other optimizations), however, causes compilers to be highly complex. This leads to bugs that add unexpected behaviors to compiled programs in ways that are very difficult to test. Formal compiler verification adds a rigorous mathematical proof of correctness to a compiler, which provides high assurance that successfully compiled programs preserve the behaviors of the source program such that bugs are not introduced. However, no parallelizing compiler has been formally verified. We lay the groundwork for verified parallelizing compilers by developing a general theory to prove the soundness of parallelizing transformations. Using this theory, we prove the soundness of a framework of small, generic transformations that compose together to build optimizations that are correct by construction. We demonstrate it by implementing several classic and cutting-edge loop-parallelizing optimizations: DOALL, DOACROSS, and Decoupled Software Pipelining. Two of our main contributions are the development and proof of a general parallelizing transformation and a transformation that coinductively folds a transformation over a potentially nonterminating loop, which we compose together to parallelize loops. Our third contribution is an exploration of the theory behind the correctness of parallelization, where we consider the preservation of nondeterminism and develop bisimulation-based proof techniques. Our proofs have been mechanically checked by the Coq Proof Assistant. iii
منابع مشابه
A Basis Approach to Loop Parallelization and Synchronisation
Loop transformation is a crucial step in parallelizing compilers. We introduce the concept of positive coordinate basis for deriving loop transformations. The basis serves to find proper loop transformations to change the dependence vectors into the desired forms. We demonstrate how this approach can systematically eztract maximal outer loop parallelism. Based on the concept, we can also constr...
متن کاملA Linear Algebraic View of Loop Transformations and Their Interaction
Although optimizing transformations have been studied for over two decades, the interactions between them is not well understood. This is particularly important for the success of parallelizing compilers. In order to deal with interactions, we view loop transformations as multiplication by a suitable matrix. The transformations considered are loop interchange, permutation, reversal, hyperplane ...
متن کاملA Linear Algebraic View of Loop
Although optimizing transformations have been studied for over two decades, the interactions between them is not well understood. This is particularly important for the success of parallelizing compilers. In order to deal with interactions, we view loop transformations as multiplication by a suitable matrix. The transformations considered are loop interchange, permutation, reversal, hyperplane ...
متن کاملLecture Notes on Linear Cache Optimization & Vectorization 15 - 411 : Compiler Design
The big missing questions on cache optimization are how and when generally to transform loops? What is the best choice to find a loop transformation? Is there a big common systematic picture? How to get fast by vectorizing and/or parallelizing loops after the loop transformations have made some loops parallelizable? And, finally, how can we use more fancy transformations for complicated problems.
متن کاملNotes on Linear Cache Optimization & Vectorization 15 - 411 : Compiler Design André Platzer
The big missing questions on cache optimization are how and when generally to transform loops? What is the best choice to find a loop transformation? Is there a big common systematic picture? How to get fast by vectorizing and/or parallelizing loops after the loop transformations have made some loops parallelizable? And, finally, how can we use more fancy transformations for complicated problems.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014